Using Multiple Taggers to Improve English Part-of-Speech Tagging by Error Learning

نویسندگان

  • Jun Wu
  • Eric Brill
چکیده

Many approaches to Part-of-Speech tagging have reached accuracy about 96-97% which is close to the upper bound, and little improvement was made in recent years. In this project, we propose an idea of increasing tagging accuracy by learning the di erence of results of multiple taggers. Since the errors of di erent taggers might be complementary, we can avoid some of them by using di erent taggers under di erent contexts. Context patterns that tell which tagging result should be used for certain circumstances are learned from the tagging results of training data, and are applied to the testing one. We combine trigram, unigram and rule-based taggers together by this means, and reduce 11% of tagging errors of trigram tagger. Many experiments are done in this project, and the possible ways to improving tagging result are tested and compared.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data

Part-of-speech information is a pre-requisite in many NLP algorithms. However, Twitter text is difficult to part-of-speech tag: it is noisy, with linguistic errors and idiosyncratic style. We present a detailed error analysis of existing taggers, motivating a series of tagger augmentations which are demonstrated to improve performance. We identify and evaluate techniques for improving English p...

متن کامل

Icelandic Data Driven Part of Speech Tagging

Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our sy...

متن کامل

Performance Analysis of a Part of Speech Tagging Task

In this paper, we attempt to make a formal analysis of the performance in automatic part of speech tagging. Lower and upper bounds in tagging precision using existing taggers or their combination are provided. Since we show that with existing taggers, automatic perfect tagging is not possible, we offer two solutions for applications requiring very high precision: (1) a solution involving minimu...

متن کامل

Improving POS Tagging Using Machine-Learning Techniques

In this paper we show how machine learning techniques for constructing and combining sev eral classi ers can be applied to improve the accuracy of an existing English POS tagger M arquez and Rodr guez Additionally the problem of data sparseness is also addressed by applying a technique of generating convex pseudo data Breiman Experimental re sults and a comparison to other state of the art tagg...

متن کامل

Fast Domain Adaptation for Part of Speech Tagging for Dialogues

Part of speech tagging accuracy deteriorates severely when a tagger is used out of domain. We investigate a fast method for domain adaptation, which provides additional in-domain training data from an unannotated data set by applying POS taggers with different biases to the unannotated data set and then choosing the set of sentences on which the taggers agree. We show that we improve the accura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997